Data Viz Design Principles

Stat 365: Statistical Communication

Wednesday, May 1st

Today we will…

  • Slow Reveal
  • Visual & Graphical Perception
  • Simplification
  • Color
  • Design for accessibility

Slow Reveal

“What do you notice? What do you wonder?”

“What new information did you just learn?”

“What do you think this graph is about? Why?”

“What would you mark along the axes?”

Visual Perception

The gestalt effect

  • gestalt = form or pattern

  • Gestalt philosophy: “The whole is other than the sum parts” -Kurt Koffka

  • Gestalt principals: Predictable ways by which we organize sensory information.

Gestalt Hierarchy

Effect Graphs
Enclosure Facets
Connection Lines
Proximity White space
Similarity Color/shape

Some things are processed slower1

Other’s are incredibly fast1

Fast = “pre-attentive processing”

  • Things that happen in <200ms of visual stimulation
  • Performed in parallel across the entire visual field

“An understanding of what is processed pre-attentively is probably the most important contribution that visual science can make to the data visualization” (Ware, 2004, p. 19)

Pre-attentive features1

Color (hue)

Orientation

Intensity

Clustering

Size

Length

and more!

Pre-attentive processing facilitates:

  • Target detection (Presence or absence)
  • Boundary detection / grouping
  • Region tracking
  • Counting and estimation

Graphical Perception

Cleveland and McGill, 1984

The following are the 10 elementary tasks in Figure 1, ordered from most to least accurate:

  1. Position along a common scale
  2. Positions along nonaligned scales
  3. Length, direction, angle
  4. Area
  5. Volume, curvature
  6. Shading, color saturation

“Apparent” magnitude1

Implications for Practice

  1. Know how we perceive groups

  2. Know that we perceive some groups before others

  3. Design to facilitate and emphasize the most important comparisons

Case Study on Gestalt Principles

Questions

Client Question

Do historical trends of full-time weekly median price charged for family child vary based on development stage in the three given California counties?

Consultant Questions

What are the elemental groupings? (What belongs together?)

How do we design for effective emphasis?

Childcare Data

What are the elemental groupings?

Do historical trends of full-time weekly median price charged for family child vary based on development stage in the three given California counties?

What are the elemental groupings?

Most elemental

  1. Weekly costs
  2. Years
  3. Development stage (infant vs toddler)
  4. County

Least elemental

Do historical trends of full-time weekly median price charged for family child vary based on development stage in the three given California counties?

Design Principles

  1. Expressiveness
  • Encode all the facts. Encode only the facts.
  1. Consistency
  • Use consistent axes when comparing charts.
  • A note on legends: order items according to appearance.
  • Avoid visually similar encodings for independent variables.
  1. Importance ordering
  • Order groups by size from the largest to the smallest
  • Simplify, simplify, simplify!

Tufte and the data-ink ratio

1

Color used poorly is worse than no color at all” - Edward Tufte

Color

Best Practices for Color Encodings

  1. The order of colors is intuitive, natural, & easy to remember.
  2. Colors should be sufficiently separable.
  3. Differences between colors should be uniform and linear.
  4. The progression of colors should be smooth.
  5. Colors should be equally visually important.
  6. Colors should be robust for color-blind individuals.
  7. Colors should be robust to contrast effects.
  1. Colors should be robust to shadows.
  2. Colors should be sensitive to the background.
  3. Color should remain within device gamuts.
  4. Colors should be aesthetically pleasing.
  5. Colors should map intuitively to data.
  6. Colors should use different maps for different variables.
  7. Colors should naturally divide values into low, medium, and high categories.
  8. Avoid the rainbow
  9. Encodings should highlight prominent values.

Brand Palettes1

  • Brand colors are often saturated, to attract attention.
  • Data-visualization colors tend to be less “shouty”.

Color Brewer Scales

Color Accessibility: CV Simulator

Even better? Double encode!1

So what?1

  • Use color sparingly
  • Use color consistently
  • Design with color deficiencies in mind
  • Be thoughtful of tone that color conveys
  • Brand colors? Maybe pick one or maybe two
  • See Design for Accessibility

Data Context

  • Keep it concise, clear, and descriptive.
  • Use active voice and avoid jargon.
  • Highlight the main message or point of the graphic.
  • Consider using a subtitle to provide additional context or information.
  • You may consider including your legend in the title.
  • Include all necessary information, such as data source.
  • Place the footnote at the bottom of the graphic or immediately after the graphic.
  • Avoid using footnotes to provide additional analysis or interpretation.
  • Describe what has been plotted.
  • Orient the reader.
  • Point out important features of the plot and implications.
  • It is okay for captions to repeat information found in the text.
  • Provide a brief but informative description of the graphic’s content for users who cannot see it.
  • Use plain language and avoid abbreviations or symbols that may be unfamiliar to some users.
  • Describe the key variables, trends, and patterns shown in the graphic.
  • Test the alternative text with screen reader software to ensure it accurately conveys the content of the graphic.

Alt Text in R

.Rmd fig.alt =

.qmd #| fig-alt:

The chart is comprised of 3 panels containing sub-charts, arranged horizontally. The panels represent different values of county name. Each sub-chart has x-axis Year with labels 2008, 2010, 2012, 2014, 2016 and 2018. Each sub-chart has y-axis Weekly Median Childcare Cost ($) with labels 150, 200, 250 and 300. There is a legend indicating colour is used to show Development Stage, with 2 levels: infant shown as brilliant blue colour and toddler show as deep orange yellow colour. There is a legend indicating shape is used to show Development Stage, with 2 levels: infant shown as solid circle shape and  toddler shown as solid triangle shape. Each sub-chart has 2 layers. Panel 1 represents data for countyname = Orange County. Layer 1 of panel 1 is a set of 22 points of which about 100% can be seen. Layer 2 of panel 1 is a set of 2 lines. Line 1 connects 11 points. This line has colour brilliant blue which maps to Development Stage = infant. Line 2 connects 11 points. This line has colour deep orange yellow which maps to Development Stage = toddler. Panel 2 represents data for county_name = San Francisco County. Layer 1 of panel 2 is a set of 22 points of which about 100% can be seen. Layer 2 of panel 2 is a set of 2 lines. Line 1 connects 11 points. This line has colour brilliant blue which maps to Development Stage = infant. Line 2 connects 11 points. This line has colour deep orange yellow which maps to Development Stage = toddler. Panel 3 represents data for county_name = San Luis Obispo County. Layer 1 of panel 3 is a set of 22 points of which about 100% can be seen. Layer 2 of panel 3 is a set of 2 lines. Line 1 connects 11 points. This line has colour brilliant blue which maps to Development Stage = infant. Line 2 connects 11 points. This line has colour deep orange yellow which maps to Development Stage = toddler.

The figure shows an overall increasing trend in weekly median childcare costs from 2008 to 2018 across Orange County, San Fransisco County, and San Luis Obispo County. Further, the weekly median childcare cost is consistently higher for infants than for toddlers.

BrailleR::VI(my_plot)

This is an untitled chart with no subtitle or caption.
The chart is comprised of 3 panels containing sub-charts, arranged horizontally.
The panels represent different values of county_name.
Each sub-chart has x-axis 'Year' with labels 2008, 2010, 2012, 2014, 2016 and 2018.
Each sub-chart has y-axis 'Weekly Median Childcare Cost ($)' with labels 150, 200, 250 and 300.
There is a legend indicating colour is used to show Development 
Stage, with 2 levels:
infant shown as brilliant blue colour and 
toddler shown as deep orange yellow colour.
There is a legend indicating shape is used to show Development 
Stage, with 2 levels:
infant shown as solid circle shape and 
toddler shown as solid triangle shape.
Each sub-chart has 2 layers.
Panel 1 represents data for county_name = Orange County.
Layer 1 of panel 1 is a set of 22 points of which about 100% can be seen.
Layer 2 of panel 1 is a set of 2 lines.
Line 1 connects 11 points.
This line has colour brilliant blue which maps to Development Stage = infant.
Line 2 connects 11 points.
This line has colour deep orange yellow which maps to Development Stage = toddler.
Panel 2 represents data for county_name = San Francisco County.
Layer 1 of panel 2 is a set of 22 points of which about 100% can be seen.
Layer 2 of panel 2 is a set of 2 lines.
Line 1 connects 11 points.
This line has colour brilliant blue which maps to Development Stage = infant.
Line 2 connects 11 points.
This line has colour deep orange yellow which maps to Development Stage = toddler.
Panel 3 represents data for county_name = San Luis Obispo County.
Layer 1 of panel 3 is a set of 22 points of which about 100% can be seen.
Layer 2 of panel 3 is a set of 2 lines.
Line 1 connects 11 points.
This line has colour brilliant blue which maps to Development Stage = infant.
Line 2 connects 11 points.
This line has colour deep orange yellow which maps to Development Stage = toddler.

To do

Data Narrative Visualization

  • Bring virtual draft to class for peer feedback on Monday, May 6th

One-number Story

  • Watch for a Canvas announcement

Read CwD 7.1 - 7.5

  • for class next Wednesday